Classification of Protein Sequences Using Markov Models Classification of Protein Sequences Using Markov Models

نویسنده

  • Claus Thomsen
چکیده

Preface This study was completed as a master thesis at the institute of Informatics and Mathematical Modeling (IMM)-Technical University of Denmark (DTU). The project has been completed in the period between September 1 st 2003 and Marts 1 st 2004 (and is rated as 35 points). Associate Professor Paul Fischer also IMM, DTU has been supervisor for the project. The report deals with a general classification problem in the area of bioinformatics, namely secondary structure prediction. However since we do not have a biological background, the problem is treated as a mathematical classification problem. The report presents the results of the conducted analyses, conclusions and implementations. The report describes the analyses, methods and ideas to a level of details, making it possible to reconstruct the tests and the results presented in this text. The structure of the report is chronological and may be read from one end to the other. However some of the more theoretical sections may be closely connected to the corresponding result sections. Several different notations are used in the report. Literature references are given as a number in index parentheses [] in a smaller font-size, like [1]. Footnotes appear in superscript and usually in the end of a sentence, like this 1. Equations, formulas and other expressions are numbered using a notation like (X.Y which is the Yth expression in section X. Abstract This project deals with a specific classification problem in the area of bioinformatics and biology. The problem, typically referred to as secondary structure prediction deals with how the structure of protein sequences may be classified using a number of predefined structure classes. This project analyses the possible use of Markov models for this classification problem. Markov models are statistical models which may be used to infer the different structure classes for protein sequences based on some training data. The performance of the developed models are compared to other known models in the area, specifically the GOR models, which are similar to Markov models since they are both statistical models. The obtained results show that Markov models may be used for secondary structure prediction achieving better performances than just guessing at the most frequent structure class. Starting out with a simple Markov model able to predict around 51% of the structures correctly, the model has been extended and combined with other methods resulting in a prediction accuracy of 57.2% (an increase of around 6%). …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences

The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...

متن کامل

Evaluation of First and Second Markov Chains Sensitivity and Specificity as Statistical Approach for Prediction of Sequences of Genes in Virus Double Strand DNA Genomes

Growing amount of information on biological sequences has made application of statistical approaches necessary for modeling and estimation of their functions. In this paper, sensitivity and specificity of the first and second Markov chains for prediction of genes was evaluated using the complete double stranded  DNA virus. There were two approaches for prediction of each Markov Model parameter,...

متن کامل

Asymmetric Effects of Monetary Policy and Business Cycles in Iran using Markov-switching Models

This paper investigates the asymmetric effects of monetary policy on economic growth over business cycles in Iran. Estimating the models using the Hamilton (1989) Markov-switching model and by employing the data for 1960-2012, the results well identify two regimes characterized as expansion and recession. Moreover, the results show that an expansionary monetary policy has a positive and statist...

متن کامل

Comparing the Bidirectional Baum-Welch Algorithm and the Baum-Welch Algorithm on Regular Lattice

A profile hidden Markov model (PHMM) is widely used in assigning protein sequences to protein families. In this model, the hidden states only depend on the previous hidden state and observations are independent given hidden states. In other words, in the PHMM, only the information of the left side of a hidden state is considered. However, it makes sense that considering the information of the b...

متن کامل

Hidden Markov Model for protein secondary structure

We address the problem of protein secondary structure prediction with Hidden Markov Models. A 21-state model is built using biological knowledge and statistical analysis of sequence motifs in regular secondary structures. Sequence family information is integrated via the combination of independent predictions of homologous sequences and a weighting scheme. Prediction accuracy with single sequen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004